The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
点云完成旨在从部分点云中恢复原始形状信息,引起了人们对3D Vision社区的关注。现有方法通常成功完成标准形状,同时未能生成某些非标准形状的点云的本地细节。为了获得理想的当地细节,全球形状信息的指导至关重要。在这项工作中,我们设计了一种有效的方法来借助类内部形状的原型表示区分标准/非标准形状,可以通过建议的监督形状聚类借口任务来计算,从而导致异构组件W.R.T完成网络。代表性的原型(定义为形状类别的特征质心)可以提供全局形状的指导,该指南被称为软性知识,以多尺度方式通过所需的选择性感知特征融合模块注入下游完成网络。此外,要进行有效的培训,我们考虑了基于困难的采样策略,以鼓励网络更多地关注一些部分点云,而几何信息较少。实验结果表明,我们的方法表现优于其他最新方法,并且具有完成复杂几何形状的强大能力。
translated by 谷歌翻译
随着自我监督学习的快速发展(例如,对比度学习),在医学图像分析中广泛认识到具有大规模图像(即使没有注释)来训练更具概括的AI模型的重要性。但是,大规模收集大规模任务的未注释数据对于单个实验室来说可能具有挑战性。现有的在线资源(例如数字书籍,出版物和搜索引擎)为获取大型图像提供了新的资源。然而,在医疗保健中发布的图像(例如放射学和病理学)由大量的带有子图的复合图组成。为了提取和分离化合物形象为下游学习的可用单个图像,我们提出了一个简单的复合图分离(SIMCFS)框架,而无需使用传统所需的检测边界框注释,并具有新的损失函数和硬案例模拟。我们的技术贡献是四倍:(1)我们引入了一个基于模拟的培训框架,该框架最小化了对资源广泛的边界框注释的需求; (2)我们提出了一种新的侧损失,可针对复合人物分离进行优化; (3)我们提出了一种阶层内图像增强方法来模拟硬病例; (4)据我们所知,这是第一项评估利用复合图像分离的自我监督学习功效的研究。从结果来看,提出的SIMCF在ImageClef 2016复合人物分离数据库上实现了最先进的性能。使用大规模开采数字的预审预革的学习模型通过对比度学习算法提高了下游图像分类任务的准确性。 SIMCF的源代码可在https://github.com/hrlblab/imageseperation上公开获得。
translated by 谷歌翻译
视觉变形金刚(VIT)通过贴片图像令牌化推动了各种视觉识别任务的最先进,然后是堆叠的自我注意操作。采用自我发场模块会导致计算和内存使用情况的二次复杂性。因此,已经在自然语言处理中进行了各种尝试以线性复杂性近似自我发挥计算的尝试。但是,这项工作的深入分析表明,它们在理论上是缺陷的,或者在经验上是无效的视觉识别。我们确定它们的局限性植根于在近似过程中保留软马克斯的自我注意力。具体而言,传统的自我注意力是通过使令状特征向量之间的缩放点产物标准化来计算的。保留SoftMax操作会挑战任何随后的线性化工作。在这个见解下,首次提出了无软磁变压器(缩写为软的变压器)。为了消除自我注意事项的软马克斯操作员,采用高斯内核函数来替代点产品相似性。这使完整的自发矩阵可以通过低级矩阵分解近似。我们近似的鲁棒性是通过使用牛顿 - 拉夫森方法来计算其摩尔 - 芬罗逆的。此外,在低级别的自我注意事项上引入了有效的对称归一化,以增强模型的推广性和可传递性。对Imagenet,Coco和ADE20K的广泛实验表明,我们的软可以显着提高现有VIT变体的计算效率。至关重要的是,具有线性复杂性,允许使用较长的令牌序列,从而使精度和复杂性之间的权衡较高。
translated by 谷歌翻译
一个自动驾驶感知模型旨在将3D语义表示从多个相机集体提取到自我汽车的鸟类视图(BEV)坐标框架中,以使下游规划师接地。现有的感知方法通常依赖于整个场景的容易出错的深度估计,或者学习稀疏的虚拟3D表示没有目标几何结构,这两者在性能和/或能力上仍然有限。在本文中,我们介绍了一种新颖的端到端体系结构,用于自我3D表示从任意数量的无限摄像机视图中学习。受射线追踪原理的启发,我们将“想象眼睛”的两极分化网格设计为可学习的自我3D表示,并通过适应性注意机制与3D到2D投影一起以自适应注意机制的形式制定学习过程。至关重要的是,该公式允许从2D图像中提取丰富的3D表示,而无需任何深度监督,并且内置的几何结构一致W.R.T. bev。尽管具有简单性和多功能性,但对标准BEV视觉任务(例如,基于摄像机的3D对象检测和BEV细分)进行了广泛的实验表明,我们的模型的表现均优于所有最新替代方案,从多任务学习。
translated by 谷歌翻译
良好的善解人意对话系统应首先跟踪并理解用户的情绪,然后以适当的情感回复。但是,目前对此任务的方法要么集中于提高对用户情绪的理解或提出更好的反应策略,而且很少有作品同时考虑这两种工作。我们的工作试图填补这一空缺。受到任务导向对话系统的启发,我们提出了一种具有情感感知对话管理的新颖善解人意的响应生成模型。情绪感知对话管理包含两个部分:(1)情绪状态跟踪保持当前用户的情绪状态,(2)善解人意的对话策略选择预测目标情绪和用户的意图,基于情绪状态跟踪的结果。然后,预测信息用于指导响应的产生。实验结果表明,与自动评估和人类评估下的几个基准相比,动态管理不同的信息可以帮助模型产生更多的移情反应。
translated by 谷歌翻译
古代定居点的检测是景观考古学的关键。传统上,通过行人调查确定了定居点,因为研究人员在物理上穿过景观和记录的结算位置。最近,古老遗骸的手动识别和标签在卫星图像上增加了考古数据收集的规模,但该过程仍然耗时耗时和艰巨。自我监督学习的发展(例如,对比学习)在使用未标记的卫星和历史空中图像定位考古地点提供可扩展的学习方案。然而,考古站点仅以整个景观的一部分出现,而现代对比监督的学习方法通​​常会在高度平衡的数据集中产生较差的性能,例如使用卫星图像在大面积上识别稀疏局部古城区化。在这项工作中,我们提出了一个解决这个长尾问题的框架。与通常分别处理标记和未标记数据的现有对比学习方法相反,所提出的方法在半监督环境下改革学习范例,以充分利用宝贵的注释数据(我们的设置中<7%)。具体地,通过在未unnotated图像斑块之间的相似性和注释的锚图像之间的相似性来形成数据的高度不平衡性质,以形成伪负对的先验知识。在这项研究中,我们使用了95,358个未标记的图像和5,830个标记的图像来解决从长尾卫星图像数据集检测古建筑的问题。从结果中,我们的半监督对比学习模式实现了79.0%的有前途的测试均衡准确性,而最先进的方法的改善是3.8%。
translated by 谷歌翻译
之前在为人类运动提供合理的限制方面发挥着重要作用。以前的作品在不同情况下遵循各种范式的运动前锋,导致缺乏多功能性。在本文中,我们首先总结了先前运动的不可或缺的特性,并因此设计了一种学习多功能运动的框架,其模拟人类运动的固有概率分布。具体地,对于有效的先前表示学习,我们提出了全局方向归一化,以在原始运动数据空间中删除冗余环境信息。此外,将基于序列的基于段的频率引导引入编码阶段。然后,我们采用去噪培训方案以可学习的方式从输入运动数据中解散环境信息,以产生一致和可区分的表示。在三个不同的任务中嵌入我们的运动前嵌入我们的运动,我们进行了广泛的实验,并且定量和定性结果均表现出我们之前运动的多功能性和有效性。我们的型号和代码可在https://github.com/jchenxu/human-motion-porion -prior上获得。
translated by 谷歌翻译
A crucial issue of current text generation models is that they often uncontrollably generate factually inconsistent text with respective of their inputs. Limited by the lack of annotated data, existing works in evaluating factual consistency directly transfer the reasoning ability of models trained on other data-rich upstream tasks like question answering (QA) and natural language inference (NLI) without any further adaptation. As a result, they perform poorly on the real generated text and are biased heavily by their single-source upstream tasks. To alleviate this problem, we propose a weakly supervised framework that aggregates multiple resources to train a precise and efficient factual metric, namely WeCheck. WeCheck first utilizes a generative model to accurately label a real generated sample by aggregating its weak labels, which are inferred from multiple resources. Then, we train the target metric model with the weak supervision while taking noises into consideration. Comprehensive experiments on a variety of tasks demonstrate the strong performance of WeCheck, which achieves a 3.4\% absolute improvement over previous state-of-the-art methods on TRUE benchmark on average.
translated by 谷歌翻译
Recent advances in operator learning theory have improved our knowledge about learning maps between infinite dimensional spaces. However, for large-scale engineering problems such as concurrent multiscale simulation for mechanical properties, the training cost for the current operator learning methods is very high. The article presents a thorough analysis on the mathematical underpinnings of the operator learning paradigm and proposes a kernel learning method that maps between function spaces. We first provide a survey of modern kernel and operator learning theory, as well as discuss recent results and open problems. From there, the article presents an algorithm to how we can analytically approximate the piecewise constant functions on R for operator learning. This implies the potential feasibility of success of neural operators on clustered functions. Finally, a k-means clustered domain on the basis of a mechanistic response is considered and the Lippmann-Schwinger equation for micro-mechanical homogenization is solved. The article briefly discusses the mathematics of previous kernel learning methods and some preliminary results with those methods. The proposed kernel operator learning method uses graph kernel networks to come up with a mechanistic reduced order method for multiscale homogenization.
translated by 谷歌翻译